India Rainfall Analysis¶

Motivation and Description¶

Monsoon prediction is clearly of great importance for India.Two types of rainfall predictions can be done, They are

  • Long term predictions: Predict rainfall over few weeks/months in advance.
  • Short term predictions: Predict rainfall a few days in advance in specific locations.

    Indian meteorological department provides forecasting data required for project. In this project we are planning to work on long term predictions of rainfall. The main motive of the project is to predict the amount of rainfall in a particular division or state well in advance. We predict the amount of rainfall using past data.

Dataset¶

  • Dataset1(dataset1) This dataset has average rainfall from 1951-2000 for each district, for every month.
  • Dataset2(dataset2) This dataset has average rainfall for every year from 1901-2005 for each state.

Methodology¶

  • Converting data in to the correct format to conduct experiments.
  • Make a good analysis of data and observe variation in the patterns of rainfall.
  • Finally, we try to predict the average rainfall by separating data into training and testing. We apply various statistical and machine learning approaches(SVM, etc) in prediction and make analysis over various approaches. By using various approaches we try to minimize the error.
In [59]:
import numpy as np # linear algebra #natrix
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
In [60]:
# import plotly.express as px
# df = px.data.iris()
# fig = px.scatter(df, x="sepal_width", y="sepal_length", color="species", marginal_y="violin",
#            marginal_x="box", trendline="ols", template="simple_white")
# fig.show()

Types of graphs¶

  • Bar graphs showing distribution of amount of rainfall.
  • Distribution of amount of rainfall yearly, monthly, groups of months.
  • Distribution of rainfall in subdivisions, districts form each month, groups of months.
  • Heat maps showing correlation between amount of rainfall between months.
In [61]:
data = pd.read_csv(r"H:/4th Year/Sem 8/MaP2/rainfall-prediction-master/data/rainfall_in_india_1901-2015.csv",sep=",")
# data = data.fillna(data.mean())
data.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4116 entries, 0 to 4115
Data columns (total 19 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   SUBDIVISION  4116 non-null   object 
 1   YEAR         4116 non-null   int64  
 2   JAN          4112 non-null   float64
 3   FEB          4113 non-null   float64
 4   MAR          4110 non-null   float64
 5   APR          4112 non-null   float64
 6   MAY          4113 non-null   float64
 7   JUN          4111 non-null   float64
 8   JUL          4109 non-null   float64
 9   AUG          4112 non-null   float64
 10  SEP          4110 non-null   float64
 11  OCT          4109 non-null   float64
 12  NOV          4105 non-null   float64
 13  DEC          4106 non-null   float64
 14  ANNUAL       4090 non-null   float64
 15  Jan-Feb      4110 non-null   float64
 16  Mar-May      4107 non-null   float64
 17  Jun-Sep      4106 non-null   float64
 18  Oct-Dec      4103 non-null   float64
dtypes: float64(17), int64(1), object(1)
memory usage: 611.1+ KB

Dataset-1 Description¶

  • Data has 36 sub divisions and 19 attributes (individual months, annual, combinations of 3 consecutive months).
  • For some of the subdivisions data is from 1950 to 2005.
  • All the attributes has the sum of amount of rainfall in mm.
In [62]:
data = data.fillna(data.mean(numeric_only = True))
In [63]:
data.head()
Out[63]:
SUBDIVISION YEAR JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC ANNUAL Jan-Feb Mar-May Jun-Sep Oct-Dec
0 ANDAMAN & NICOBAR ISLANDS 1901 49.2 87.1 29.2 2.3 528.8 517.5 365.1 481.1 332.6 388.5 558.2 33.6 3373.2 136.3 560.3 1696.3 980.3
1 ANDAMAN & NICOBAR ISLANDS 1902 0.0 159.8 12.2 0.0 446.1 537.1 228.9 753.7 666.2 197.2 359.0 160.5 3520.7 159.8 458.3 2185.9 716.7
2 ANDAMAN & NICOBAR ISLANDS 1903 12.7 144.0 0.0 1.0 235.1 479.9 728.4 326.7 339.0 181.2 284.4 225.0 2957.4 156.7 236.1 1874.0 690.6
3 ANDAMAN & NICOBAR ISLANDS 1904 9.4 14.7 0.0 202.4 304.5 495.1 502.0 160.1 820.4 222.2 308.7 40.1 3079.6 24.1 506.9 1977.6 571.0
4 ANDAMAN & NICOBAR ISLANDS 1905 1.3 0.0 3.3 26.9 279.5 628.7 368.7 330.5 297.0 260.7 25.4 344.7 2566.7 1.3 309.7 1624.9 630.8
In [64]:
data.describe()
Out[64]:
YEAR JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC ANNUAL Jan-Feb Mar-May Jun-Sep Oct-Dec
count 4116.000000 4116.000000 4116.000000 4116.000000 4116.000000 4116.000000 4116.000000 4116.000000 4116.000000 4116.000000 4116.000000 4116.000000 4116.000000 4116.000000 4116.000000 4116.000000 4116.000000 4116.000000
mean 1958.218659 18.957320 21.805325 27.359197 43.127432 85.745417 230.234444 347.214334 290.263497 197.361922 95.507009 39.866163 18.870580 1411.008900 40.747786 155.901753 1064.724769 154.100487
std 33.140898 33.569044 35.896396 46.925176 67.798192 123.189974 234.568120 269.310313 188.678707 135.309591 99.434452 68.593545 42.318098 900.986632 59.265023 201.096692 706.881054 166.678751
min 1901.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.400000 0.000000 0.000000 0.100000 0.000000 0.000000 0.000000 62.300000 0.000000 0.000000 57.400000 0.000000
25% 1930.000000 0.600000 0.600000 1.000000 3.000000 8.600000 70.475000 175.900000 156.150000 100.600000 14.600000 0.700000 0.100000 806.450000 4.100000 24.200000 574.375000 34.200000
50% 1958.000000 6.000000 6.700000 7.900000 15.700000 36.700000 138.900000 284.900000 259.500000 174.100000 65.750000 9.700000 3.100000 1125.450000 19.300000 75.200000 882.250000 98.800000
75% 1987.000000 22.125000 26.800000 31.225000 49.825000 96.825000 304.950000 418.225000 377.725000 265.725000 148.300000 45.825000 17.700000 1635.100000 50.300000 196.900000 1287.550000 212.600000
max 2015.000000 583.700000 403.500000 605.600000 595.100000 1168.600000 1609.900000 2362.800000 1664.600000 1222.000000 948.300000 648.900000 617.500000 6331.100000 699.500000 1745.800000 4536.900000 1252.500000
In [65]:
data.hist(figsize=(24,24));
No description has been provided for this image

Observations¶

  • Above histograms show the distribution of rainfall over months.
  • Observed increase in amount of rainfall over months July, August, September.
In [66]:
data.groupby("YEAR").sum()['ANNUAL'].plot(figsize=(12,8));
No description has been provided for this image

Observations¶

  • Shows distribution of rainfall over years.
  • Observed high amount of rainfall in 1950s.
In [67]:
data[['YEAR', 'JAN', 'FEB', 'MAR', 'APR', 'MAY', 'JUN', 'JUL',
       'AUG', 'SEP', 'OCT', 'NOV', 'DEC']].groupby("YEAR").sum().plot(figsize=(13,8));
No description has been provided for this image
In [68]:
px.line(data[['YEAR','Jan-Feb', 'Mar-May',
       'Jun-Sep', 'Oct-Dec']])
In [69]:
data[['YEAR','Jan-Feb', 'Mar-May',
       'Jun-Sep', 'Oct-Dec']].groupby("YEAR").sum().plot(figsize=(13,8));
No description has been provided for this image

Observations¶

  • The above two graphs show the distribution of rainfall over months.
  • The graphs clearly shows that amount of rainfall in high in the months july, aug, sep which is monsoon season in India.
In [70]:
data[['SUBDIVISION', 'JAN', 'FEB', 'MAR', 'APR', 'MAY', 'JUN', 'JUL',
       'AUG', 'SEP', 'OCT', 'NOV', 'DEC']].groupby("SUBDIVISION").mean().plot.barh(stacked=True,figsize=(13,8));
No description has been provided for this image
In [71]:
data[['SUBDIVISION', 'Jan-Feb', 'Mar-May',
       'Jun-Sep', 'Oct-Dec']].groupby("SUBDIVISION").sum().plot.barh(stacked=True,figsize=(16,8));
No description has been provided for this image
In [72]:
px.box(data[['JAN', 'FEB', 'MAR', 'APR', 'MAY', 'JUN', 'JUL',
       'AUG', 'SEP', 'OCT', 'NOV', 'DEC']])

Observations¶

  • Above two graphs shows that the amount of rainfall is reasonably good in the months of march, april, may in eastern India.
In [73]:
plt.figure(figsize=(11,4))
sns.heatmap(data[['Jan-Feb','Mar-May','Jun-Sep','Oct-Dec','ANNUAL']].corr(),annot=True)
plt.show()
No description has been provided for this image
In [74]:
plt.figure(figsize=(11,4))
sns.heatmap(data[['JAN','FEB','MAR','APR','MAY','JUN','JUL','AUG','SEP','OCT','NOV','DEC','ANNUAL']].corr(),annot=True)
plt.show()
No description has been provided for this image
In [75]:
px.scatter(data[['Jan-Feb','Mar-May','Jun-Sep','Oct-Dec','ANNUAL']])

Observations¶

  • Heat Map shows the co-relation(dependency) betwenn the amounts of rainfall over months.
  • From above it is clear that if amount of rainfall is high in the months of july, august, september then the amount of rainfall will be high annually.
  • It is also obwserved that if amount of rainfall in good in the months of october, november, december then the rainfall is going to b good in the overall year.
In [76]:
#Function to plot the graphs
def plot_graphs(groundtruth,prediction,title):        
    N = 9
    ind = np.arange(N)  # the x locations for the groups
    width = 0.27       # the width of the bars

    fig = plt.figure()
    fig.suptitle(title, fontsize=12)
    ax = fig.add_subplot(111)
    rects1 = ax.bar(ind, groundtruth, width, color='b')
    rects2 = ax.bar(ind+width, prediction, width, color='g')

    ax.set_xlabel("Month of the Year")
    ax.set_ylabel("Amount of rainfall")
    ax.set_xticks(ind+width)
    ax.set_xticklabels( ('APR', 'MAY', 'JUN', 'JUL','AUG', 'SEP', 'OCT', 'NOV', 'DEC') )
    ax.legend( (rects1[0], rects2[0]), ('Ground truth', 'Prediction') )

#     autolabel(rects1)
    for rect in rects1:
        h = rect.get_height()
        ax.text(rect.get_x()+rect.get_width()/2., 1.05*h, '%d'%int(h),
                ha='center', va='bottom')
    for rect in rects2:
        h = rect.get_height()
        ax.text(rect.get_x()+rect.get_width()/2., 1.05*h, '%d'%int(h),
                ha='center', va='bottom')
#     autolabel(rects2)

    plt.show()

Predictions¶

  • For prediction we formatted data in the way, given the rainfall in the last three months we try to predict the rainfall in the next consecutive month.
  • For all the experiments we used 80:20 training and test ratio.
    • Linear regression
    • SVR
    • Artificial neural nets
  • Tersting metrics: We used Mean absolute error to train the models.
  • We also shown the amount of rainfall actually and predicted with the histogram plots.
  • We did two types of trainings once training on complete dataset and other with training with only telangana data
  • All means are standard deviation observations are written, first one represents ground truth, second one represents predictions.
In [77]:
# seperation of training and testing data
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_absolute_error

division_data = np.asarray(data[['JAN', 'FEB', 'MAR', 'APR', 'MAY', 'JUN', 'JUL',
       'AUG', 'SEP', 'OCT', 'NOV', 'DEC']])

X = None; y = None
for i in range(division_data.shape[1]-3):
    if X is None:
        X = division_data[:, i:i+3]
        y = division_data[:, i+3]
    else:
        X = np.concatenate((X, division_data[:, i:i+3]), axis=0)
        y = np.concatenate((y, division_data[:, i+3]), axis=0)
        
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1, random_state=42)
In [78]:
#test 2010
temp = data[['SUBDIVISION','JAN', 'FEB', 'MAR', 'APR', 'MAY', 'JUN', 'JUL',
       'AUG', 'SEP', 'OCT', 'NOV', 'DEC']].loc[data['YEAR'] == 2010]

data_2010 = np.asarray(temp[['JAN', 'FEB', 'MAR', 'APR', 'MAY', 'JUN', 'JUL',
       'AUG', 'SEP', 'OCT', 'NOV', 'DEC']].loc[temp['SUBDIVISION'] == 'TELANGANA'])

X_year_2010 = None; y_year_2010 = None
for i in range(data_2010.shape[1]-3):
    if X_year_2010 is None:
        X_year_2010 = data_2010[:, i:i+3]
        y_year_2010 = data_2010[:, i+3]
    else:
        X_year_2010 = np.concatenate((X_year_2010, data_2010[:, i:i+3]), axis=0)
        y_year_2010 = np.concatenate((y_year_2010, data_2010[:, i+3]), axis=0)
In [79]:
#test 2005
temp = data[['SUBDIVISION','JAN', 'FEB', 'MAR', 'APR', 'MAY', 'JUN', 'JUL',
       'AUG', 'SEP', 'OCT', 'NOV', 'DEC']].loc[data['YEAR'] == 2005]

data_2005 = np.asarray(temp[['JAN', 'FEB', 'MAR', 'APR', 'MAY', 'JUN', 'JUL',
       'AUG', 'SEP', 'OCT', 'NOV', 'DEC']].loc[temp['SUBDIVISION'] == 'TELANGANA'])

X_year_2005 = None; y_year_2005 = None
for i in range(data_2005.shape[1]-3):
    if X_year_2005 is None:
        X_year_2005 = data_2005[:, i:i+3]
        y_year_2005 = data_2005[:, i+3]
    else:
        X_year_2005 = np.concatenate((X_year_2005, data_2005[:, i:i+3]), axis=0)
        y_year_2005 = np.concatenate((y_year_2005, data_2005[:, i+3]), axis=0)
In [80]:
#terst 2005
temp = data[['SUBDIVISION','JAN', 'FEB', 'MAR', 'APR', 'MAY', 'JUN', 'JUL',
       'AUG', 'SEP', 'OCT', 'NOV', 'DEC']].loc[data['YEAR'] == 2015]

data_2015 = np.asarray(temp[['JAN', 'FEB', 'MAR', 'APR', 'MAY', 'JUN', 'JUL',
       'AUG', 'SEP', 'OCT', 'NOV', 'DEC']].loc[temp['SUBDIVISION'] == 'TELANGANA'])

X_year_2015 = None; y_year_2015 = None
for i in range(data_2015.shape[1]-3):
    if X_year_2015 is None:
        X_year_2015 = data_2015[:, i:i+3]
        y_year_2015 = data_2015[:, i+3]
    else:
        X_year_2015 = np.concatenate((X_year_2015, data_2015[:, i:i+3]), axis=0)
        y_year_2015 = np.concatenate((y_year_2015, data_2015[:, i+3]), axis=0)
In [81]:
from sklearn import linear_model

# linear model
reg = linear_model.ElasticNet(alpha=0.5)
reg.fit(X_train, y_train)
y_pred = reg.predict(X_test)
print (mean_absolute_error(y_test, y_pred))
96.32435229744083
In [82]:
#2005
y_year_pred_2005 = reg.predict(X_year_2005)


#2010
y_year_pred_2010 = reg.predict(X_year_2010)
    
y_year_pred_2015 = reg.predict(X_year_2015)

print ("MEAN 2005")
print (np.mean(y_year_2005),np.mean(y_year_pred_2005))
print ("Standard deviation 2005")
print (np.sqrt(np.var(y_year_2005)),np.sqrt(np.var(y_year_pred_2005)))


print ("MEAN 2010")
print (np.mean(y_year_2010),np.mean(y_year_pred_2010))
print ("Standard deviation 2010")
print (np.sqrt(np.var(y_year_2010)),np.sqrt(np.var(y_year_pred_2010)))


print ("MEAN 2015")
print (np.mean(y_year_2015),np.mean(y_year_pred_2015))
print ("Standard deviation 2015")
print (np.sqrt(np.var(y_year_2015)),np.sqrt(np.var(y_year_pred_2015)))


plot_graphs(y_year_2005,y_year_pred_2005,"Year-2005")
plot_graphs(y_year_2010,y_year_pred_2010,"Year-2010")
plot_graphs(y_year_2015,y_year_pred_2015,"Year-2015")
# px.bar(y_year_2015,y_year_pred_2015)
MEAN 2005
121.2111111111111 134.68699821349804
Standard deviation 2005
123.77066107608005 90.86310230416439
MEAN 2010
139.93333333333334 144.80501326515912
Standard deviation 2010
135.71320250194282 95.94931363601724
MEAN 2015
88.52222222222223 119.64752006738831
Standard deviation 2015
86.62446123324875 62.36355370163372
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
In [83]:
from sklearn.svm import SVR

# SVM model
clf = SVR(gamma='auto', C=0.1, epsilon=0.2)
clf.fit(X_train, y_train) 
y_pred = clf.predict(X_test)
print (mean_absolute_error(y_test, y_pred))
127.1600615632603
In [84]:
#2005
y_year_pred_2005 = reg.predict(X_year_2005)

#2010
y_year_pred_2010 = reg.predict(X_year_2010)
    
#2015
y_year_pred_2015 = reg.predict(X_year_2015)

print ("MEAN 2005")
print (np.mean(y_year_2005),np.mean(y_year_pred_2005))
print ("Standard deviation 2005")
print (np.sqrt(np.var(y_year_2005)),np.sqrt(np.var(y_year_pred_2005)))



print ("MEAN 2010")
print (np.mean(y_year_2010),np.mean(y_year_pred_2010))
print ("Standard deviation 2010")
print (np.sqrt(np.var(y_year_2010)),np.sqrt(np.var(y_year_pred_2010)))


print ("MEAN 2015")
print (np.mean(y_year_2015),np.mean(y_year_pred_2015))
print ("Standard deviation 2015")
print (np.sqrt(np.var(y_year_2015)),np.sqrt(np.var(y_year_pred_2015)))

plot_graphs(y_year_2005,y_year_pred_2005,"Year-2005")
plot_graphs(y_year_2010,y_year_pred_2010,"Year-2010")
plot_graphs(y_year_2015,y_year_pred_2015,"Year-2015")
MEAN 2005
121.2111111111111 134.68699821349804
Standard deviation 2005
123.77066107608005 90.86310230416439
MEAN 2010
139.93333333333334 144.80501326515912
Standard deviation 2010
135.71320250194282 95.94931363601724
MEAN 2015
88.52222222222223 119.64752006738831
Standard deviation 2015
86.62446123324875 62.36355370163372
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
In [85]:
from keras.models import Model
from keras.layers import Dense, Input, Conv1D, Flatten

# NN model
inputs = Input(shape=(3,1))
x = Conv1D(64, 2, padding='same', activation='elu')(inputs)
x = Conv1D(128, 2, padding='same', activation='elu')(x)
x = Flatten()(x)
x = Dense(128, activation='elu')(x)
x = Dense(64, activation='elu')(x)
x = Dense(32, activation='elu')(x)
x = Dense(1, activation='linear')(x)
model = Model(inputs=[inputs], outputs=[x])
model.compile(loss='mean_squared_error', optimizer='adamax', metrics=['mae'])
model.summary()
Model: "model"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 input_1 (InputLayer)        [(None, 3, 1)]            0         
                                                                 
 conv1d (Conv1D)             (None, 3, 64)             192       
                                                                 
 conv1d_1 (Conv1D)           (None, 3, 128)            16512     
                                                                 
 flatten (Flatten)           (None, 384)               0         
                                                                 
 dense (Dense)               (None, 128)               49280     
                                                                 
 dense_1 (Dense)             (None, 64)                8256      
                                                                 
 dense_2 (Dense)             (None, 32)                2080      
                                                                 
 dense_3 (Dense)             (None, 1)                 33        
                                                                 
=================================================================
Total params: 76353 (298.25 KB)
Trainable params: 76353 (298.25 KB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________
In [86]:
model.fit(x=np.expand_dims(X_train, axis=2), y=y_train, batch_size=64, epochs=30, verbose=1, validation_split=0.1, shuffle=True)
y_pred = model.predict(np.expand_dims(X_test, axis=2))
print (mean_absolute_error(y_test, y_pred))
Epoch 1/30
469/469 [==============================] - 4s 6ms/step - loss: 20048.7051 - mae: 88.0432 - val_loss: 17790.2695 - val_mae: 87.2101
Epoch 2/30
469/469 [==============================] - 3s 6ms/step - loss: 18590.6836 - mae: 86.5963 - val_loss: 17492.6152 - val_mae: 86.1020
Epoch 3/30
469/469 [==============================] - 2s 5ms/step - loss: 18464.1328 - mae: 86.4011 - val_loss: 17713.4082 - val_mae: 88.1468
Epoch 4/30
469/469 [==============================] - 2s 5ms/step - loss: 18494.6621 - mae: 86.2514 - val_loss: 17419.8516 - val_mae: 85.9579
Epoch 5/30
469/469 [==============================] - 3s 6ms/step - loss: 18364.5234 - mae: 86.1242 - val_loss: 18251.0723 - val_mae: 82.8346
Epoch 6/30
469/469 [==============================] - 2s 5ms/step - loss: 18358.2754 - mae: 86.1360 - val_loss: 17427.7363 - val_mae: 85.9941
Epoch 7/30
469/469 [==============================] - 2s 5ms/step - loss: 18261.5645 - mae: 85.8212 - val_loss: 17553.6562 - val_mae: 86.5898
Epoch 8/30
469/469 [==============================] - 2s 5ms/step - loss: 18235.7559 - mae: 85.6793 - val_loss: 17132.9551 - val_mae: 83.5857
Epoch 9/30
469/469 [==============================] - 2s 5ms/step - loss: 18130.5762 - mae: 85.5270 - val_loss: 17429.0723 - val_mae: 85.7730
Epoch 10/30
469/469 [==============================] - 2s 5ms/step - loss: 18151.4941 - mae: 85.4313 - val_loss: 17761.3945 - val_mae: 88.6346
Epoch 11/30
469/469 [==============================] - 2s 5ms/step - loss: 18123.6719 - mae: 85.6369 - val_loss: 17250.7324 - val_mae: 83.2518
Epoch 12/30
469/469 [==============================] - 3s 5ms/step - loss: 18135.6582 - mae: 85.4035 - val_loss: 17464.4434 - val_mae: 87.3533
Epoch 13/30
469/469 [==============================] - 3s 6ms/step - loss: 18025.4121 - mae: 85.2521 - val_loss: 17350.4336 - val_mae: 84.1670
Epoch 14/30
469/469 [==============================] - 3s 6ms/step - loss: 18068.4121 - mae: 85.1873 - val_loss: 17049.9883 - val_mae: 84.7489
Epoch 15/30
469/469 [==============================] - 3s 5ms/step - loss: 17976.3203 - mae: 85.0841 - val_loss: 17243.8965 - val_mae: 84.4475
Epoch 16/30
469/469 [==============================] - 2s 5ms/step - loss: 17980.6465 - mae: 85.0565 - val_loss: 16952.8125 - val_mae: 83.5256
Epoch 17/30
469/469 [==============================] - 3s 6ms/step - loss: 17926.0586 - mae: 84.8479 - val_loss: 17217.4863 - val_mae: 84.8391
Epoch 18/30
469/469 [==============================] - 3s 6ms/step - loss: 17923.0098 - mae: 84.8559 - val_loss: 17166.1367 - val_mae: 85.9009
Epoch 19/30
469/469 [==============================] - 3s 6ms/step - loss: 17890.9004 - mae: 84.7324 - val_loss: 16922.4199 - val_mae: 83.7106
Epoch 20/30
469/469 [==============================] - 3s 6ms/step - loss: 17807.5254 - mae: 84.5616 - val_loss: 17023.0020 - val_mae: 82.3268
Epoch 21/30
469/469 [==============================] - 3s 5ms/step - loss: 17755.3887 - mae: 84.5581 - val_loss: 17115.4102 - val_mae: 83.5145
Epoch 22/30
469/469 [==============================] - 3s 5ms/step - loss: 17743.2715 - mae: 84.4051 - val_loss: 17064.6113 - val_mae: 84.9584
Epoch 23/30
469/469 [==============================] - 3s 5ms/step - loss: 17670.2461 - mae: 84.4333 - val_loss: 17600.5391 - val_mae: 85.8445
Epoch 24/30
469/469 [==============================] - 3s 5ms/step - loss: 17675.9238 - mae: 84.2927 - val_loss: 16890.9570 - val_mae: 82.7249
Epoch 25/30
469/469 [==============================] - 2s 5ms/step - loss: 17621.6992 - mae: 84.0671 - val_loss: 17052.2383 - val_mae: 84.5972
Epoch 26/30
469/469 [==============================] - 2s 5ms/step - loss: 17579.6895 - mae: 84.0536 - val_loss: 16896.7773 - val_mae: 83.1722
Epoch 27/30
469/469 [==============================] - 2s 5ms/step - loss: 17561.6934 - mae: 83.9348 - val_loss: 16902.0176 - val_mae: 84.7980
Epoch 28/30
469/469 [==============================] - 3s 5ms/step - loss: 17521.4219 - mae: 83.9597 - val_loss: 16993.0352 - val_mae: 83.8641
Epoch 29/30
469/469 [==============================] - 2s 5ms/step - loss: 17487.1641 - mae: 83.8764 - val_loss: 17374.5312 - val_mae: 85.5506
Epoch 30/30
469/469 [==============================] - 2s 5ms/step - loss: 17429.5371 - mae: 83.7223 - val_loss: 17012.5020 - val_mae: 82.2100
116/116 [==============================] - 0s 3ms/step
84.45708244925238
In [87]:
#2005
y_year_pred_2005 = reg.predict(X_year_2005)

#2010
y_year_pred_2010 = reg.predict(X_year_2010)
    
#2015
y_year_pred_2015 = reg.predict(X_year_2015)

print ("MEAN 2005")
print (np.mean(y_year_2005),np.mean(y_year_pred_2005))
print ("Standard deviation 2005")
print (np.sqrt(np.var(y_year_2005)),np.sqrt(np.var(y_year_pred_2005)))

print ("MEAN 2010")
print (np.mean(y_year_2010),np.mean(y_year_pred_2010))
print ("Standard deviation 2010")
print (np.sqrt(np.var(y_year_2010)),np.sqrt(np.var(y_year_pred_2010)))



print ("MEAN 2015")
print (np.mean(y_year_2015),np.mean(y_year_pred_2015))
print ("Standard deviation 2015")
print (np.sqrt(np.var(y_year_2015)),np.sqrt(np.var(y_year_pred_2015)))

plot_graphs(y_year_2005,y_year_pred_2005,"Year-2005")
plot_graphs(y_year_2010,y_year_pred_2010,"Year-2010")
plot_graphs(y_year_2015,y_year_pred_2015,"Year-2015")
MEAN 2005
121.2111111111111 134.68699821349804
Standard deviation 2005
123.77066107608005 90.86310230416439
MEAN 2010
139.93333333333334 144.80501326515912
Standard deviation 2010
135.71320250194282 95.94931363601724
MEAN 2015
88.52222222222223 119.64752006738831
Standard deviation 2015
86.62446123324875 62.36355370163372
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
In [88]:
# spliting training and testing data only for telangana
telangana = np.asarray(data[['JAN', 'FEB', 'MAR', 'APR', 'MAY', 'JUN', 'JUL',
       'AUG', 'SEP', 'OCT', 'NOV', 'DEC']].loc[data['SUBDIVISION'] == 'TELANGANA'])

X = None; y = None
for i in range(telangana.shape[1]-3):
    if X is None:
        X = telangana[:, i:i+3]
        y = telangana[:, i+3]
    else:
        X = np.concatenate((X, telangana[:, i:i+3]), axis=0)
        y = np.concatenate((y, telangana[:, i+3]), axis=0)
        
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.01, random_state=42)        
In [89]:
from sklearn import linear_model

# linear model
reg = linear_model.ElasticNet(alpha=0.5)
reg.fit(X_train, y_train)
y_pred = reg.predict(X_test)
print (mean_absolute_error(y_test, y_pred))
64.72601914484643
In [90]:
#2005
y_year_pred_2005 = reg.predict(X_year_2005)

#2010
y_year_pred_2010 = reg.predict(X_year_2010)
    
#2015
y_year_pred_2015 = reg.predict(X_year_2015)

print ("MEAN 2005")
print (np.mean(y_year_2005),np.mean(y_year_pred_2005))
print ("Standard deviation 2005")
print (np.sqrt(np.var(y_year_2005)),np.sqrt(np.var(y_year_pred_2005)))


print ("MEAN 2010")
print (np.mean(y_year_2010),np.mean(y_year_pred_2010))
print ("Standard deviation 2010")
print (np.sqrt(np.var(y_year_2010)),np.sqrt(np.var(y_year_pred_2010)))

print ("MEAN 2015")
print (np.mean(y_year_2015),np.mean(y_year_pred_2015))
print ("Standard deviation 2015")
print (np.sqrt(np.var(y_year_2015)),np.sqrt(np.var(y_year_pred_2015)))

plot_graphs(y_year_2005,y_year_pred_2005,"Year-2005")
plot_graphs(y_year_2010,y_year_pred_2010,"Year-2010")
plot_graphs(y_year_2015,y_year_pred_2015,"Year-2015")
# sns.scatterplot(data=y_year_2015,x = "month" , y ="rainfall(mm)" , hue = "YEAR")
MEAN 2005
121.2111111111111 106.49798150231581
Standard deviation 2005
123.77066107608005 76.08558540019236
MEAN 2010
139.93333333333334 112.18662987131034
Standard deviation 2010
135.71320250194282 84.35813629737333
MEAN 2015
88.52222222222223 96.76817006572782
Standard deviation 2015
86.62446123324875 52.45304841713268
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
In [91]:
from sklearn.svm import SVR

# SVM model
clf = SVR(kernel='rbf', gamma='auto', C=0.5, epsilon=0.2)
clf.fit(X_train, y_train) 
y_pred = clf.predict(X_test)
print (mean_absolute_error(y_test, y_pred))
115.32415990638656
In [92]:
#2005
y_year_pred_2005 = reg.predict(X_year_2005)

#2010
y_year_pred_2010 = reg.predict(X_year_2010)
    
#2015
y_year_pred_2015 = reg.predict(X_year_2015)

print ("MEAN 2005")
print (np.mean(y_year_2005),np.mean(y_year_pred_2005))
print ("Standard deviation 2005")
print (np.sqrt(np.var(y_year_2005)),np.sqrt(np.var(y_year_pred_2005)))

print ("MEAN 2010")
print (np.mean(y_year_2010),np.mean(y_year_pred_2010))
print ("Standard deviation 2010")
print (np.sqrt(np.var(y_year_2010)),np.sqrt(np.var(y_year_pred_2010)))

print ("MEAN 2015")
print (np.mean(y_year_2015),np.mean(y_year_pred_2015))
print ("Standard deviation 2015")
print (np.sqrt(np.var(y_year_2015)),np.sqrt(np.var(y_year_pred_2015)))

plot_graphs(y_year_2005,y_year_pred_2005,"Year-2005")
plot_graphs(y_year_2010,y_year_pred_2010,"Year-2010")
plot_graphs(y_year_2015,y_year_pred_2015,"Year-2015")
MEAN 2005
121.2111111111111 106.49798150231581
Standard deviation 2005
123.77066107608005 76.08558540019236
MEAN 2010
139.93333333333334 112.18662987131034
Standard deviation 2010
135.71320250194282 84.35813629737333
MEAN 2015
88.52222222222223 96.76817006572782
Standard deviation 2015
86.62446123324875 52.45304841713268
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
In [93]:
model.fit(x=np.expand_dims(X_train, axis=2), y=y_train, batch_size=64, epochs=10, verbose=1, validation_split=0.1, shuffle=True)
y_pred = model.predict(np.expand_dims(X_test, axis=2))
print (mean_absolute_error(y_test, y_pred))
Epoch 1/10
15/15 [==============================] - 0s 12ms/step - loss: 6831.6484 - mae: 59.8372 - val_loss: 4876.1357 - val_mae: 51.4384
Epoch 2/10
15/15 [==============================] - 0s 7ms/step - loss: 6211.3706 - mae: 57.3244 - val_loss: 4587.8960 - val_mae: 51.0172
Epoch 3/10
15/15 [==============================] - 0s 8ms/step - loss: 6016.0649 - mae: 57.2547 - val_loss: 4491.1641 - val_mae: 51.1315
Epoch 4/10
15/15 [==============================] - 0s 7ms/step - loss: 5869.5327 - mae: 57.0192 - val_loss: 4383.1455 - val_mae: 50.2226
Epoch 5/10
15/15 [==============================] - 0s 8ms/step - loss: 5785.2510 - mae: 55.5861 - val_loss: 4274.9116 - val_mae: 49.2660
Epoch 6/10
15/15 [==============================] - 0s 7ms/step - loss: 5707.2461 - mae: 55.2401 - val_loss: 4240.0122 - val_mae: 49.0595
Epoch 7/10
15/15 [==============================] - 0s 7ms/step - loss: 5649.3164 - mae: 54.9786 - val_loss: 4199.9941 - val_mae: 48.6766
Epoch 8/10
15/15 [==============================] - 0s 8ms/step - loss: 5592.8267 - mae: 54.6077 - val_loss: 4156.4116 - val_mae: 48.6950
Epoch 9/10
15/15 [==============================] - 0s 9ms/step - loss: 5551.9995 - mae: 54.5321 - val_loss: 4145.3403 - val_mae: 48.3177
Epoch 10/10
15/15 [==============================] - 0s 7ms/step - loss: 5511.2412 - mae: 53.8805 - val_loss: 4138.8491 - val_mae: 47.8778
1/1 [==============================] - 0s 35ms/step
62.04201758341355
In [94]:
#2005
y_year_pred_2005 = reg.predict(X_year_2005)

#2010
y_year_pred_2010 = reg.predict(X_year_2010)
    
#2015
y_year_pred_2015 = reg.predict(X_year_2015)

print ("MEAN 2005")
print (np.mean(y_year_2005),np.mean(y_year_pred_2005))
print ("Standard deviation 2005")
print (np.sqrt(np.var(y_year_2005)),np.sqrt(np.var(y_year_pred_2005)))

print ("MEAN 2010")
print (np.mean(y_year_2010),np.mean(y_year_pred_2010))
print ("Standard deviation 2010")
print (np.sqrt(np.var(y_year_2010)),np.sqrt(np.var(y_year_pred_2010)))

print ("MEAN 2015")
print (np.mean(y_year_2015),np.mean(y_year_pred_2015))
print ("Standard deviation 2015")
print (np.sqrt(np.var(y_year_2015)),np.sqrt(np.var(y_year_pred_2015)))



plot_graphs(y_year_2005,y_year_pred_2005,"Year-2005")
plot_graphs(y_year_2010,y_year_pred_2010,"Year-2010")
plot_graphs(y_year_2015,y_year_pred_2015,"Year-2015")
MEAN 2005
121.2111111111111 106.49798150231581
Standard deviation 2005
123.77066107608005 76.08558540019236
MEAN 2010
139.93333333333334 112.18662987131034
Standard deviation 2010
135.71320250194282 84.35813629737333
MEAN 2015
88.52222222222223 96.76817006572782
Standard deviation 2015
86.62446123324875 52.45304841713268
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image

Prediction Observations¶

Training on complete dataset¶

Algorithm MAE
Linear Regression 94.94821727619338
SVR 127.74073860203839
Artificial neural nets 85.2648713528865

Training on telangana dataset¶

Algorithm MAE
Linear Regression 70.61463829282977
SVR 90.30526775954294
Artificial neural nets 59.95190786532157
  • Neural Networks performs better than SVR etc.
  • Observed MAE is very high which indicates machine learning models won't work well for prediction of rainfall.
  • Telangana data has a single pattern that can be learned by models, rather than learning different patterns of all states. so has high accuracy.
  • Analysed individual year rainfall patterns for 2005, 2010, 2015.
  • Approximately close means, noticed less standard deviations.

District wise details¶

  • Similar to above the number of attributes is same, we don’t have year in this.
  • The amount of rainfall in mm for each district is added from 1950-2000.
  • We analyse the data individually for the state Andhra Pradesh
In [95]:
district = pd.read_csv(r"H:/4th Year/Sem 8/MaP2/rainfall-prediction-master/data/district_wise_rainfall_normal.csv",sep=",")
district = district.fillna(district.mean(numeric_only=True))
district.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 641 entries, 0 to 640
Data columns (total 19 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   STATE_UT_NAME  641 non-null    object 
 1   DISTRICT       641 non-null    object 
 2   JAN            641 non-null    float64
 3   FEB            641 non-null    float64
 4   MAR            641 non-null    float64
 5   APR            641 non-null    float64
 6   MAY            641 non-null    float64
 7   JUN            641 non-null    float64
 8   JUL            641 non-null    float64
 9   AUG            641 non-null    float64
 10  SEP            641 non-null    float64
 11  OCT            641 non-null    float64
 12  NOV            641 non-null    float64
 13  DEC            641 non-null    float64
 14  ANNUAL         641 non-null    float64
 15  Jan-Feb        641 non-null    float64
 16  Mar-May        641 non-null    float64
 17  Jun-Sep        641 non-null    float64
 18  Oct-Dec        641 non-null    float64
dtypes: float64(17), object(2)
memory usage: 95.3+ KB
In [96]:
district.head()
Out[96]:
STATE_UT_NAME DISTRICT JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC ANNUAL Jan-Feb Mar-May Jun-Sep Oct-Dec
0 ANDAMAN And NICOBAR ISLANDS NICOBAR 107.3 57.9 65.2 117.0 358.5 295.5 285.0 271.9 354.8 326.0 315.2 250.9 2805.2 165.2 540.7 1207.2 892.1
1 ANDAMAN And NICOBAR ISLANDS SOUTH ANDAMAN 43.7 26.0 18.6 90.5 374.4 457.2 421.3 423.1 455.6 301.2 275.8 128.3 3015.7 69.7 483.5 1757.2 705.3
2 ANDAMAN And NICOBAR ISLANDS N & M ANDAMAN 32.7 15.9 8.6 53.4 343.6 503.3 465.4 460.9 454.8 276.1 198.6 100.0 2913.3 48.6 405.6 1884.4 574.7
3 ARUNACHAL PRADESH LOHIT 42.2 80.8 176.4 358.5 306.4 447.0 660.1 427.8 313.6 167.1 34.1 29.8 3043.8 123.0 841.3 1848.5 231.0
4 ARUNACHAL PRADESH EAST SIANG 33.3 79.5 105.9 216.5 323.0 738.3 990.9 711.2 568.0 206.9 29.5 31.7 4034.7 112.8 645.4 3008.4 268.1
In [97]:
district[['DISTRICT', 'JAN', 'FEB', 'MAR', 'APR', 'MAY', 'JUN', 'JUL',
       'AUG', 'SEP', 'OCT', 'NOV', 'DEC']].groupby("DISTRICT").mean()[:40].plot.barh(stacked=True,figsize=(13,8));
No description has been provided for this image
In [98]:
district[['DISTRICT', 'Jan-Feb', 'Mar-May',
       'Jun-Sep', 'Oct-Dec']].groupby("DISTRICT").sum()[:40].plot.barh(stacked=True,figsize=(16,8));
No description has been provided for this image

Observations¶

  • The above two graphs shows the distribution of over each district.
  • As there are large number of districts only 40 were shown in the graphs.

Andhra Pradesh Data

In [99]:
ap_data = district[district['STATE_UT_NAME'] == 'ANDHRA PRADESH']
In [100]:
ap_data[['DISTRICT', 'JAN', 'FEB', 'MAR', 'APR', 'MAY', 'JUN', 'JUL',
       'AUG', 'SEP', 'OCT', 'NOV', 'DEC']].groupby("DISTRICT").mean()[:40].plot.barh(stacked=True,figsize=(18,8));
No description has been provided for this image
In [101]:
ap_data[['DISTRICT', 'Jan-Feb', 'Mar-May',
       'Jun-Sep', 'Oct-Dec']].groupby("DISTRICT").sum()[:40].plot.barh(stacked=True,figsize=(16,8));
No description has been provided for this image

Observations¶

  • The above two graphs shows the distribution of over each district in Andhra Pradesh.
  • The above graphs show that more amount of rainfall is found in srikakulam district, least amount of rainfall is found in Anantapur district.
  • It also shows that almost all states have more amount of rainfall have more amount of rainfall in the months june, july, september.
In [102]:
plt.figure(figsize=(11,4))
sns.heatmap(ap_data[['Jan-Feb','Mar-May','Jun-Sep','Oct-Dec','ANNUAL']].corr(),annot=True)
plt.show()
No description has been provided for this image
In [103]:
plt.figure(figsize=(11,4))
sns.heatmap(ap_data[['JAN','FEB','MAR','APR','MAY','JUN','JUL','AUG','SEP','OCT','NOV','DEC','ANNUAL']].corr(),annot=True)
plt.show()
No description has been provided for this image

Observations¶

  • It is observed that in Andhra Pradesh, annual rainfall depends more in the months of january, febuary.
  • It also shows that if there is rainfall in months march, april, may then there is less amount of rainfall in the months june, july, august, september.

Predictions¶

  • We used the same types of models and evaluation metrics used for the above dataset.
  • We also tested the amount of rainfall in hyderabad by models trained on complete dataset and andhra pradesh dataset.
In [104]:
# testing and training for the complete data
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_absolute_error

division_data = np.asarray(district[['JAN', 'FEB', 'MAR', 'APR', 'MAY', 'JUN', 'JUL',
       'AUG', 'SEP', 'OCT', 'NOV', 'DEC']])

X = None; y = None
for i in range(division_data.shape[1]-3):
    if X is None:
        X = division_data[:, i:i+3]
        y = division_data[:, i+3]
    else:
        X = np.concatenate((X, division_data[:, i:i+3]), axis=0)
        y = np.concatenate((y, division_data[:, i+3]), axis=0)
        
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
In [105]:
temp = district[['DISTRICT','JAN', 'FEB', 'MAR', 'APR', 'MAY', 'JUN', 'JUL','AUG', 'SEP', 'OCT', 'NOV', 'DEC']].loc[district['STATE_UT_NAME'] == 'ANDHRA PRADESH']
hyd = np.asarray(temp[['JAN', 'FEB', 'MAR', 'APR', 'MAY', 'JUN', 'JUL','AUG', 'SEP', 'OCT', 'NOV', 'DEC']].loc[temp['DISTRICT'] == 'HYDERABAD'])
# print temp
X_year = None; y_year = None
for i in range(hyd.shape[1]-3):
    if X_year is None:
        X_year = hyd[:, i:i+3]
        y_year = hyd[:, i+3]
    else:
        X_year = np.concatenate((X_year, hyd[:, i:i+3]), axis=0)
        y_year = np.concatenate((y_year, hyd[:, i+3]), axis=0)
 
In [106]:
from sklearn import linear_model

# linear model
reg = linear_model.ElasticNet(alpha=0.5)
reg.fit(X_train, y_train)
y_pred = reg.predict(X_test)
print (mean_absolute_error(y_test, y_pred))
57.08862331011229
In [107]:
y_year_pred = reg.predict(X_year)
print ("MEAN Hyderabad")
print (np.mean(y_year),np.mean(y_year_pred))
print ("Standard deviation hyderabad")
print (np.sqrt(np.var(y_year)),np.sqrt(np.var(y_year_pred)))

plot_graphs(y_year,y_year_pred,"Prediction in Hyderabad")
MEAN Hyderabad
91.48888888888888 108.2025052233288
Standard deviation hyderabad
69.2514651982091 58.90326979488765
No description has been provided for this image
In [108]:
from sklearn.svm import SVR

# SVM model
clf = SVR(gamma='auto', C=0.1, epsilon=0.2)
clf.fit(X_train, y_train) 
y_pred = clf.predict(X_test)
print (mean_absolute_error(y_test, y_pred))
116.60671510825178
In [109]:
y_year_pred = clf.predict(X_year)
print ("MEAN Hyderabad")
print (np.mean(y_year),np.mean(y_year_pred))
print ("Standard deviation hyderabad")
print (np.sqrt(np.var(y_year)),np.sqrt(np.var(y_year_pred)))

plot_graphs(y_year,y_year_pred,"Prediction in Hyderabad")
MEAN Hyderabad
91.48888888888888 80.34903236716154
Standard deviation hyderabad
69.2514651982091 0.14736007434982146
No description has been provided for this image
In [110]:
model.fit(x=np.expand_dims(X_train, axis=2), y=y_train, batch_size=64, epochs=10, verbose=1, validation_split=0.1, shuffle=True)
y_pred = model.predict(np.expand_dims(X_test, axis=2))
print (mean_absolute_error(y_test, y_pred))
Epoch 1/10
65/65 [==============================] - 0s 6ms/step - loss: 7410.1948 - mae: 53.1640 - val_loss: 3912.4832 - val_mae: 41.4290
Epoch 2/10
65/65 [==============================] - 0s 6ms/step - loss: 5438.0537 - mae: 44.2064 - val_loss: 3657.5393 - val_mae: 37.6070
Epoch 3/10
65/65 [==============================] - 0s 6ms/step - loss: 5199.6011 - mae: 42.8950 - val_loss: 3673.1670 - val_mae: 37.0153
Epoch 4/10
65/65 [==============================] - 0s 6ms/step - loss: 5046.6323 - mae: 41.5777 - val_loss: 3512.6877 - val_mae: 36.8002
Epoch 5/10
65/65 [==============================] - 1s 10ms/step - loss: 5012.4092 - mae: 41.7169 - val_loss: 3471.7871 - val_mae: 36.1732
Epoch 6/10
65/65 [==============================] - 0s 6ms/step - loss: 4903.3105 - mae: 41.2504 - val_loss: 3802.2419 - val_mae: 36.2144
Epoch 7/10
65/65 [==============================] - 0s 6ms/step - loss: 4838.3657 - mae: 40.7725 - val_loss: 3813.9683 - val_mae: 37.3677
Epoch 8/10
65/65 [==============================] - 0s 6ms/step - loss: 4896.7153 - mae: 41.1777 - val_loss: 3616.3206 - val_mae: 36.5161
Epoch 9/10
65/65 [==============================] - 0s 6ms/step - loss: 4763.4482 - mae: 40.3536 - val_loss: 4103.4834 - val_mae: 38.9619
Epoch 10/10
65/65 [==============================] - 0s 6ms/step - loss: 4733.0625 - mae: 40.5106 - val_loss: 3609.8958 - val_mae: 37.2121
37/37 [==============================] - 0s 3ms/step
42.28625546922097
In [111]:
y_year_pred = model.predict(np.expand_dims(X_year, axis=2))
print ("MEAN Hyderabad")
print (np.mean(y_year),np.mean(y_year_pred))
print ("Standard deviation hyderabad")
print (np.sqrt(np.var(y_year)),np.sqrt(np.var(y_year_pred)))


# plot_graphs(y_year,y_year_pred,"Prediction in Hyderabad")
1/1 [==============================] - 0s 41ms/step
MEAN Hyderabad
91.48888888888888 108.403656
Standard deviation hyderabad
69.2514651982091 74.833984
In [112]:
# training and testing sets for only andhra pradesh data
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_absolute_error

division_data = np.asarray(ap_data[['JAN', 'FEB', 'MAR', 'APR', 'MAY', 'JUN', 'JUL',
       'AUG', 'SEP', 'OCT', 'NOV', 'DEC']])

X = None; y = None
for i in range(division_data.shape[1]-3):
    if X is None:
        X = division_data[:, i:i+3]
        y = division_data[:, i+3]
    else:
        X = np.concatenate((X, division_data[:, i:i+3]), axis=0)
        y = np.concatenate((y, division_data[:, i+3]), axis=0)
        
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
In [113]:
from sklearn import linear_model

# linear model
reg = linear_model.ElasticNet(alpha=0.5)
reg.fit(X_train, y_train)
y_pred = reg.predict(X_test)
print (mean_absolute_error(y_test, y_pred))
31.249748674622488
In [114]:
y_year_pred = reg.predict(X_year)
print ("MEAN Hyderabad")
print (np.mean(y_year),np.mean(y_year_pred))
print ("Standard deviation hyderabad")
print (np.sqrt(np.var(y_year)),np.sqrt(np.var(y_year_pred)))
plot_graphs(y_year,y_year_pred,"Prediction in Hyderabad")
MEAN Hyderabad
91.48888888888888 96.5489199306844
Standard deviation hyderabad
69.2514651982091 60.819355195446896
No description has been provided for this image
In [115]:
from sklearn.svm import SVR

# SVM model
clf = SVR(gamma='auto', C=0.1, epsilon=0.2)
clf.fit(X_train, y_train) 
y_pred = clf.predict(X_test)
print (mean_absolute_error(y_test, y_pred))
59.35057496896855
In [116]:
y_year_pred = clf.predict(X_year)
print ("MEAN Hyderabad")
print (np.mean(y_year),np.mean(y_year_pred))
print ("Standard deviation hyderabad")
print (np.sqrt(np.var(y_year)),np.sqrt(np.var(y_year_pred)))
plot_graphs(y_year,y_year_pred,"Prediction in Hyderabad")
MEAN Hyderabad
91.48888888888888 95.89978206795146
Standard deviation hyderabad
69.2514651982091 0.09247315036320868
No description has been provided for this image
In [117]:
model.fit(x=np.expand_dims(X_train, axis=2), y=y_train, batch_size=64, epochs=10, verbose=1, validation_split=0.1, shuffle=True)
y_pred = model.predict(np.expand_dims(X_test, axis=2))
print (mean_absolute_error(y_test, y_pred))
Epoch 1/10
3/3 [==============================] - 0s 51ms/step - loss: 1881.6719 - mae: 32.7400 - val_loss: 1128.7218 - val_mae: 23.7975
Epoch 2/10
3/3 [==============================] - 0s 25ms/step - loss: 1729.9153 - mae: 30.2609 - val_loss: 1073.9745 - val_mae: 24.3387
Epoch 3/10
3/3 [==============================] - 0s 37ms/step - loss: 1635.9270 - mae: 28.6857 - val_loss: 1071.7965 - val_mae: 25.3617
Epoch 4/10
3/3 [==============================] - 0s 34ms/step - loss: 1578.2692 - mae: 27.8114 - val_loss: 1044.4392 - val_mae: 25.4608
Epoch 5/10
3/3 [==============================] - 0s 30ms/step - loss: 1514.2677 - mae: 27.4533 - val_loss: 1018.0604 - val_mae: 25.2427
Epoch 6/10
3/3 [==============================] - 0s 29ms/step - loss: 1441.3817 - mae: 26.9870 - val_loss: 1004.6213 - val_mae: 24.9050
Epoch 7/10
3/3 [==============================] - 0s 29ms/step - loss: 1392.2823 - mae: 26.9143 - val_loss: 1011.1804 - val_mae: 24.7493
Epoch 8/10
3/3 [==============================] - 0s 28ms/step - loss: 1367.8698 - mae: 27.1685 - val_loss: 1023.4977 - val_mae: 24.7469
Epoch 9/10
3/3 [==============================] - 0s 29ms/step - loss: 1349.9089 - mae: 27.1171 - val_loss: 1032.5244 - val_mae: 24.6267
Epoch 10/10
3/3 [==============================] - 0s 28ms/step - loss: 1333.8815 - mae: 26.9949 - val_loss: 1036.2633 - val_mae: 24.1076
2/2 [==============================] - 0s 4ms/step
34.32628827776229
In [118]:
y_year_pred = model.predict(np.expand_dims(X_year, axis=2))
print ("MEAN Hyderabad")
print (np.mean(y_year),np.mean(y_year_pred))
print ("Standard deviation hyderabad")
print (np.sqrt(np.var(y_year)),np.sqrt(np.var(y_year_pred)))
# plot_graphs(y_train,y_year_pred,"Prediction in Hyderabad")
1/1 [==============================] - 0s 22ms/step
MEAN Hyderabad
91.48888888888888 100.43834
Standard deviation hyderabad
69.2514651982091 63.309994

Prediction Observations¶

Training on complete dataset¶

Algorithm MAE
Linear Regression 57.08862331011236
SVR 116.60671510825178
Artificial neural nets 44.329664907381066

Training on Hyderabad dataset¶

Algorithm MAE
Linear Regression 31.249748674622477
SVR 59.35057496896855
Artificial neural nets 31.0601823988415
  • Neural Networks performs better than SVR etc.
  • Bad performance by SVR model.
  • Andhra Pradesh data has a single pattern that can be learned by models, rather than learning different patterns of all states. so has high accuracy.
  • Analysed individual year rainfall patterns for Hyderabad district.
  • Approximately close means, noticed close standard deviations.

Conclusions¶

  • Various visualizations of data are observed which helps in implementing the approaches for prediction.
  • Prediction of amount of rainfall for both the types of dataset.
  • Observations indicates machine learning models won't work well for prediction of rainfall due to fluctutaions in rainfall.

Technologies¶

  • Programming language : Python ,Streamlit
  • Libraries : numpy, pandas, matplotlib, seaborn, keras, scipy, sklearn
  • Github repo: link